Numerical Algorithms, Libraries, and Software Frameworks for Future HPC Systems (Towards the Post Moore Era)

نویسنده

  • Takeshi Iwashita
چکیده

Currently, the growth in the performance of supercomputers or high-end computing systems relies mainly on the increase in parallelism provided by many cores and special instruction sets. Consequently, researchers and developers of numerical algorithms and libraries must consider massive parallelism. At least O (10) threads and O (10) computational nodes should be effectively utilized. This will be a primary concern when developing novel algorithms and implementation methods for systems over the next few years. However, the situation will change within 10 years. It is predicted that Moore’s law will end between 2025 and 2030. When Moore’s law ends, we will face a turning point. Although it is apparent that we cannot further improve the flops of a single chip, it is hard to forecast future computer and processor architectures. However, bytes will continue to increase. For example, three dimensional stacking and silicon photonics technologies will contribute to increases in the bandwidth between memory and processors, and between computational nodes. Moreover, non-volatile memory will be used to reduce power requirements. However, to take full advantage of the benefits of the increase in bytes provided by these technologies in practical analyses, we must change our computational paradigm and numerical algorithms. Over the last decade, algorithms that use more flops and less bytes have been preferable, but now we must focus on increasing bytes but decreasing flops. However, efficient use of these new technologies is not straightforward. For future algorithms, we should consider complex and deep memory hierarchies, the heterogeneity of memory latencies, and the efficient use of logical units attached to memory modules. For example, we should intensively investigate bandwidth and latency reducing algorithms, which make more use of lower layers with higher memory bandwidths or reduce global synchronizations and communications. In real applications, flops per watt is more important than flops. Many real world applications including bid data analyses rely on improvements in the flops per watt to lead to social innovations. Fortunately, the flops per watt can be improved even if Moore’s law ends and the number of transistors that can operate for a fixed power input does not increase. For specific applications or computational kernels, we can effectively use special instructions (e.g., SIMD), accelerators, and reconfigurable hardware (e.g., FPGA) to increase the (effective) flops per watt. We should investigate novel implementation methods for these hardware systems and associated algorithms for the typical computational kernels required by real world applications. This research could also help to reduce the power consumption in real applications running on current or near future systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Framework for Development of Data-Movement Centric Applications

Currently, memory wall is the most critical issue for performance of modern HPC systems. Therefore, increasing compute intensity is essential for extracting potential performance of such systems efficiently. Although strategies for design and development of numerical algorithms and applications on the Post Moore System are not so different from current ones, we should pay more attention to data...

متن کامل

Developing a High-Performance Computing/Numerical Analysis Roadmap

A roadmap activity in the UK has leveraged US and European efforts for identifying the challenges and barriers in the development of high-performance computing (HPC) algorithms and software. The activity has identified the Grand Challenge to provide: 1. Algorithms and software that application developers can reuse in the form of high-quality, high performance, sustained software components, lib...

متن کامل

Common Computational Frameworks as Benchmarking Platforms

Computational Frameworks, supporting multiple algorithms and applications with common parallel, I/O, and other computational routines, can provide an excellent substrate for application-scale benchmarks. Such benchmarks are essential for estimating the effective performance of HPC systems for the purpose of system procurements. Cactus is extremely portable and its modularity supports a variety ...

متن کامل

Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective☆

Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software librar...

متن کامل

Inter-Agency Workshop on HPC Resilience at Extreme Scale

The following report summarizes the proceedings of a three-and-a-half day inter-agency workshop focused on the technical challenges of HPC resilience in the 2020 Exascale timeframe. The resilience problem is not specific to any particular program or agency; coordinated resilience solutions will be challenging because of the need for a truly integrated approach. The interagency workshop therefor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016